The Heterogeneous Collection Track at INEX 2006

نویسندگان

  • Ingo Frommholz
  • Ray R. Larson
چکیده

While the primary INEX test collection is based on a single DTD, it is realistic to assume that most XML collections consist of documents from different sources. This leads to a heterogeneity of syntax, semantics and document genre. In order to cope with the challenges posed by such a diverse environment, the heterogeneous track was offered at INEX 2006. Within this track, we set up a collection consisting of several different and diverse collections. We defined retrieval tasks and identified a set of topics. These are the foundations for future run submissions, relevance assessments and proper evaluation of the proposed methods dealing with a heterogeneous collection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Interactive Track at INEX 2006

In this paper we describe the planned setup of the INEX 2006 interactive track. As the track has been delayed and data collection has not been completed before the INEX 2006 workshop, the track will continue into 2007. Special emphasis is put on comparing XML element retrieval with passage retrieval, and on investigating differences between multiple dimensions of the search tasks.

متن کامل

Building and Experimenting with a Heterogeneous Collection

Today’s integrated retrieval applications retrieve documents from disparate data sources. Therefore, as part of INEX 2004, we ran a heterogeneous track to explore the experimentation with a heterogeneous collection of documents. We built a collection comprising various sub-collections, re-used topics (queries) from the sub-collections and created new topics, and participants submitted the resul...

متن کامل

Using Topic Shifts in XML Retrieval at INEX 2006

This paper describes the retrieval approaches used by Queen Mary, University of London in the INEX 2006 ad hoc track. In our participation, we mainly investigate element-specific smoothing method within the language modelling framework. We adjust the amount of smoothing required for each XML element depending on its number of topic shifts to provide a focused access to XML elements in the Wikip...

متن کامل

Unsupervised Classification of Text-Centric XML Document Collections

This paper addresses the problem of the unsupervised classification of text-centric XML documents. In the context of the INEX mining track 2006, we present methods to exploit the inherent structural information of XML documents in the document clustering process. Using the k-means algorithm, we have experimented with a couple of feature sets, to discover that a promising direction is to use str...

متن کامل

INEX REPORT Report on the XML Mining Track at INEX 2005 and INEX 2006 Categorization and Clustering of XML Documents

This article is a report concerning the two years of the XML Mining track at INEX (2005 and 2006). We focus here on the classification and clustering of XML documents. We detail these two tasks and the corpus used for this challenge and then present a summary of the different methods proposed by the participants. We last compare the results obtained during the two years of the track.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006